Enhancing image quality in an image system

ABSTRACT

A technique for improving image compression by pre-processing the image frames. In particular, methods for de-interlacing and noise reduction using combinations of median filters, applied both spatially and temporally, with and without motion analysis, are described.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a divisional of and claims priority to U.S.application Ser. No. 09/545,233 filed on Apr. 7, 2000 now U.S. Pat. No.6,728,317 (which is incorporated herein in its entirety), which was acontinuation-in-part application of U.S. application Ser. No. 09/442,595filed on Nov. 17, 1999 now abandoned, which was a continuation of U.S.application Ser. No. 09/217,151 filed on Dec. 21, 1998 (now U.S. Pat.No. 5,988,863, issued Nov. 23, 1999), which was a continuation of U.S.application Ser. No. 08/594,815 filed Jan. 30, 1996 (now U.S. Pat. No.5,852,565, issued Dec. 22, 1998).

TECHNICAL FIELD

This invention relates to electronic communication systems, and moreparticularly to an advanced electronic television system having enhancedcompression, filtering, and display characteristics.

BACKGROUND

The United States presently uses the NTSC standard for televisiontransmissions. However, proposals have been made to replace the NTSCstandard with an Advanced Television standard. For example, it has beenproposed that the U.S. adopt digital standard-definition and advancedtelevision formats at rates of 24 Hz, 30 Hz, 60 Hz, and 60 Hzinterlaced. It is apparent that these rates are intended to continue(and thus be compatible with) the existing NTSC television display rateof 60 Hz (or 59.94 Hz). It is also apparent that “3-2 pulldown” isintended for display on 60 Hz displays when presenting movies, whichhave a temporal rate of 24 frames per second (fps). However, while theabove proposal provides a menu of possible formats from which to select,each format only encodes and decodes a single resolution and frame rate.Because the display or motion rates of these formats are not integrallyrelated to each other, conversion from one to another is difficult.

Further, this proposal does not provide a crucial capability ofcompatibility with computer displays. These proposed image motion ratesare based upon historical rates which date back to the early part ofthis century. If a “clean-slate” were to be made, it is unlikely thatthese rates would be chosen. In the computer industry, where displayscould utilize any rate over the last decade, rates in the 70 to 80 Hzrange have proven optimal, with 72 and 75 Hz being the most commonrates. Unfortunately, the proposed rates of 30 and 60 Hz lack usefulinteroperability with 72 or 75 Hz, resulting in degraded temporalperformance.

In addition, it is being suggested by some that interlace is required,due to a claimed need to have about 1000 lines of resolution at highframe rates, but based upon the notion that such images cannot becompressed within the available 18-19 mbits/second of a conventional 6MHz broadcast television channel.

It would be much more desirable if a single signal format were to beadopted, containing within it all of the desired standard and highdefinition resolutions. However, to do so within the bandwidthconstraints of a conventional 6 MHz broadcast television channelrequires compression and “scalability” of both frame rate (temporal) andresolution (spatial). One method specifically intended to provide forsuch scalability is the MPEG-2 standard. Unfortunately, the temporal andspatial scalability features specified within the MPEG-2 standard (andnewer standards, like MPEG-4) are not sufficiently efficient toaccommodate the needs of advanced television for the U.S. Thus, theproposal for advanced television for the U.S. is based upon the premisethat temporal (frame rate) and spatial (resolution) layering areinefficient, and therefore discrete formats are necessary.

Further, it would be desirable to provide enhancements to resolution,image clarity, coding efficiency, and video production efficiency. Thepresent invention provides such enhancements.

SUMMARY

The invention provides a number of enhancements to handle a variety ofvideo quality and compression problems. The following describes a numberof such enhancements, most of which are preferably embodied as a set oftools which can be applied to the tasks of enhancing images andcompressing such images. The tools can be combined by a contentdeveloper in various ways, as desired, to optimize the visual qualityand compression efficiency of a compressed data stream, particularly alayered compressed data stream.

Such tools include improved de-interlacing and noise reductionenhancements, including motion analysis.

The details of one or more embodiments of the invention are set forth inthe accompanying drawings and the description below. Other features,objects, and advantages of the invention will be apparent from thedescription and drawings, and from the claims.

DESCRIPTION OF DRAWINGS

FIG. 1A is a block diagram of an odd-field de-interlacer.

FIG. 1B is a block diagram of an even-field de-interlacer.

FIG. 2 is a block diagram of a frame de-interlacer using threede-interlaced fields.

FIG. 3 is a block diagram of a threshold test FIG. 3 is a block diagramof a threshold test.

FIG. 4 is a block diagram of a preferred combination of median filtersFIG. 4 is a block diagram of a preferred combination of median filters.

FIG. 5 is a diagram of the relative shape, amplitudes, and lobe polarityof a preferred downsizing filter.

FIGS. 6A and 6B are diagrams of the relative shape, amplitudes, and lobepolarity of a pair of preferred upsizing filters for upsizing by afactor of 2.

Like reference symbols in the various drawings indicate like elements.

DETAILED DESCRIPTION

Throughout this description, the preferred embodiment and examples shownshould be considered as exemplars, rather than as limitations on theinvention.

A number of enhancements may be made to handle a variety of videoquality and compression problems. The following describes a number ofsuch enhancements, most of which are preferably embodied as a set oftools which can be applied to the tasks of enhancing images andcompressing such images. The tools can be combined by a contentdeveloper in various ways, as desired, to optimize the visual qualityand compression efficiency of a compressed data stream, particularly alayered compressed data stream.

De-Interlacing and Noise Reduction Enhancements

Overview

Experimentation has shown that many de-interlacing algorithms anddevices depend upon the human eye to integrate fields to create anacceptable result. However, since compression algorithms are not a humaneye, any integration of de-interlaced fields should take into accountthe characteristics of such algorithms. Without such carefulde-interlaced integration, the compression process will create highlevels of noise artifacts, both wasting bits (hindering compression) aswell as making the image look noisy and busy with artifacts. Thisdistinction between de-interlacing for viewing (such as withline-doublers and line-quadruplers) vs. de-interlacing as input tocompression, has lead to the techniques described below. In particular,the de-interlacing techniques described below are useful as input tosingle-layer non-interlaced MPEG-like, as well as to the layeredMPEG-like compression.

Further, noise reduction must similarly match the needs of being aninput to compression algorithms, rather than just reducing noiseappearance. The goal is generally to reproduce, upon decompression, nomore noise than the original camera or film-grain noise. Equal noise isgenerally considered acceptable, after compression/decompression.Reduced noise, with equivalent sharpness and clarity with the original,is a bonus. The noise reduction described below achieves these goals.

Further, for very noisy shots, such as from high speed film or with highcamera sensitivity settings, usually in low light, noise reduction canbe the difference between a good looking compressed/decompressed imagevs. one which is unwatchably noisy. The compression process greatlyamplifies noise which is above some threshold of acceptability to thecompressor. Thus, the use of noise-reduction pre-processing to keepnoise below this threshold may be required for acceptable good qualityresults.

De-Graining and Noise-Reducing Filters

It has been found through experimentation that applying de-grainingand/or noise-reducing filtering before layered or non-layered encodingimproves the ability of the compression system to perform. Whilede-graining or noise-reduction is most effective on grainy or noisyimages prior to compression, either process may be helpful when used inmoderation even on relatively low noise or low grain pictures. Any ofseveral known de-graining or noise-reduction algorithms may be applied.Examples are “coring”, simple neighbor median filters, and softeningfilters.

Whether noise-reduction is needed is determined by how noisy theoriginal images are. For interlaced original images, the interlaceitself is a form of noise, which usually will require additional noisereduction filtering, in addition to the complex de-interlacing processdescribed below. For progressive scan (non-interlaced) camera or filmimages, noise processing is useful in layered and non-layeredcompression when noise is present above a certain level.

There are different types of noise. For example, video transfers fromfilm include film grain noise. Film grain noise is caused by silvergrains which couple to yellow, cyan, and magenta film dyes. Yellowaffects both red and green, cyan affects both blue and green, andmagenta affects both red and blue. Red is formed where yellow andmagenta dye crystals overlap. Similarly green is the overlap of yellowand cyan, and blue is the overlap of magenta and cyan. Thus, noisebetween colors is partially correlated through the dyes and grainsbetween pairs of colors. Further, when multiple grains overlap in allthree colors, as they do in a print dark regions of the image or on anegative in light regions of the image (dark on the negative),additional color combinations occur. This correlation between the colorscan be utilized in film-grain noise reduction, but is a complex process.Further, many different film types are used, and each type has differentgrain sizes, shapes, and statistical distributions.

For video images created by CCD-sensor and other (e.g., tube) sensorcameras, the red, green, and blue noise is uncorrelated. In this case,it is best to process the red, green, and blue records independently.Thus, red noise is reduced with self-red processing independently ofgreen noise and blue noise; the same approach applies to green and bluenoise.

Thus, noise processing is best matched to the characteristics of thenoise source itself. In the case of a composite image (from multiplesources), the noise may differ in characteristics over differentportions of the image. In this situation, generic noise processing maybe the only option, if noise processing is needed.

It has also been found useful in some cases to perform a “re-graining”or “re-noising” process after decoding a compressed layered data stream,as a creative effect, since some de-grained or de-noised images may be“too clean” or “too sterile” in appearance. Re-graining and/orre-noising are relatively easy effects to add in the decoder using anyof several known algorithms. For example, this can be accomplished bythe addition of low pass filtered random noise of suitable amplitude.

De-Interlacing Before Compression

As mentioned above, the preferred compression method for interlacedsource which is ultimately intended for non-interlaced display includesa step to de-interlace the interlaced source before the compressionsteps. De-interlacing a signal after decoding in the receiver, where thesignal has been compressed in the interlaced mode, is both more costlyand less efficient than de-interlacing prior to compression, and thensending a non-interlaced compressed signal. The non-interlacedcompressed signal can be either layered or non-layered (i.e., aconventional single layer compression).

Experimentation has shown that filtering a single field of an interlacedsource, and using that field as if it were a non-interlaced full frame,gives poor and noisy compression results. Thus, using a single-fieldde-interlacer prior to compression is not a good approach. Instead,experimentation has shown that a three-field-frame de-interlacer processusing field synthesized frames (“field-frames”), with weights of [0.25,0.5, 0.25] for the previous, current, and next field-frames,respectively, provides a good input for compression. Combining threefield-frames may be performed using other weights (although theseweights are optimal) to create a de-interlaced input to a compressionprocess.

In the preferred de-interlacing system, a field-de-interlacer is used asthe first step in the overall process to create field-frames. Inparticular, each field is de-interlaced, creating a synthesized framewhere the total number of lines in the frame is derived from the halfnumber of lines in a field. Thus, for example, an interlaced 1080 lineimage will have 540 lines per even and odd field, each fieldrepresenting 1/60th of a second. Normally, the even and odd fields of540 lines will be interlaced to create 1080 lines for each frame, whichrepresents 1/30th of a second. However, in the preferred embodiment, thede-interlacer copies each scanline without modification from a specifiedfield (e.g., the odd fields) to a buffer that will hold some of thede-interlaced result. The remaining intermediate scanlines (in thisexample, the even scanlines) for the frame are synthesized by addinghalf of the field line above and half of the field line below each newlystored line. For example, the pixel values of line 2 for a frame wouldeach comprise ½ of the summed corresponding pixel values from each ofline 1 and line 3. The generation of intermediate synthesized scanlinesmay be done on the fly, or may be computed after all of the scanlinesfrom a field are stored in a buffer. The same process is repeated forthe next field, although the field types (i.e., even, odd) will bereversed.

FIG. 1A is a block diagram of an odd-field de-interlacer, showing thatthe odd lines from an odd field 10 are simply copied to a de-interlacedodd field 12, while the even lines are created by averaging adjacent oddlines from the original odd field together to form the even lines of thede-interlaced odd field 12. Similarly, FIG. 1B is a block diagram of aneven-field de-interlacer, showing that the even lines from an even field14 are simply copied to a de-interlaced even field 16, while the oddlines are created by averaging adjacent even lines from the originaleven field together to form the odd lines of the de-interlaced evenfield 16. Note that this case corresponds to “top field first”; “bottomfield first” could also be considered the “even” field.

As a next step, a sequence of these de-interlaced fields is then used asinput to a three-field-frame de-interlacer to create a finalde-interlaced frame. FIG. 2 is a block diagram showing how the pixels ofeach output frame are composed of 25% of the corresponding pixels from aprevious deinterlaced field (field-frame) 22, 50% of the correspondingpixels from a current field-frame 24, and 25% of the correspondingpixels from the next field-frame 26.

The new de-interlaced frame than contains much fewer interlacedifference artifacts between frames than do the three field-frames ofwhich it is composed. However, there is a temporal smearing by addingthe previous field-frame and next field-frame into a currentfield-frame. This temporal smearing is usually not objectionable,especially in light of the de-interlacing improvements which result.

This de-interlacing process is very beneficial as input to compression,either single layer (unlayered) or layered. It is also beneficial justas a treatment for interlaced video for presentation, viewing, or makingstill frames, independent of use with compression. The picture from thede-interlacing process appears “clearer” than the presentation of theinterlace directly, or of the de-interlaced fields.

De-Interlace Thresholding

Although the de-interlace three-field sum weightings of [0.25, 0.5,0.25] discussed above provide a stable image, moving parts of a scenecan sometimes become soft or can exhibit aliasing artifacts. Tocounteract this, a threshold test may be applied which compares theresult of the [0.25, 0.5, 0.25] temporal filter against thecorresponding pixel values of only the middle field-frame. If a middlefield-frame pixel value differs more than a specified threshold amountfrom the value of the corresponding pixel from the three-field-frametemporal filter, then only the middle field-frame pixel value is used.In this way, a pixel from the three-field-frame temporal filter isselected where it differs less than the threshold amount from thecorresponding pixel of the single de-interlaced middle field-frame, andthe middle field-frame pixel value is used when there is more differencethan the threshold. This allows fast motion to be tracked at the fieldrate, and smoother parts of the image to be filtered and smoothed by thethree-field-frame temporal filter. This combination has proven aneffective, if not optimal, input to compression. It is also veryeffective for processing for direct viewing to de-interlace imagematerial (also called line doubling in conjunction with display).

The preferred embodiment for such threshold determinations uses thefollowing equations for corresponding RGB color values from the middle(single) de-interlaced field-frame image and the three-field-framede-interlaced image:Rdiff=R_single_field_de-interlaced minus R_three_-field_de-interlacedGdiff=G_single_field_de-interlaced minus G_three_-field_de-interlacedBdiff=B_single_field_de-interlaced minus B_three_-field_de-interlacedThresholdingValue=abs(Rdiff+Gdiff+Bdiff)+abs(Rdiff)+abs(Gdiff)+abs(Bdiff)

The ThresholdingValue is then compared to a threshold setting. Typicalthreshold settings are in the range of 0.1 to 0.3, with 0.2 being mostcommon. FIG. 3 shows a block diagram of this threshold test. ThePROCESSING block 30 multiplies the inputs by [0.25, 0.5, 0.25] and sumsthe results. The SELECTION CONTROL block 32 compares the output 36 ofthe PROCESSING block 30 with Input B 34 using the above equations forRdiff, Gdiff, Bdiff, and ThresholdingValue. The switch selects thePROCESSING output 36 if the ThresholdingValue is less than thethreshold, otherwise the switch selects Input B 34, the middle value,for the output 38.

In order to remove noise from this threshold, smooth-filtering thethree-field-frame and single-field-frame de-interlaced pictures can beused before comparing and thresholding them. This smooth filtering canbe accomplished simply by down filtering (e.g., down filtering by two),and then up filtering (e.g., using a gaussian up-filter by two). This“down-up” smoothed filter can be applied to both the single-field-framede-interlaced picture and the three-field-frame de-interlaced picture.The smoothed single-field-frame and three-field-frame pictures can thenbe compared to compute a ThresholdingValue and then thresholded todetermine which picture will source each final output pixel.

In particular, the threshold test is used as a switch to select betweenthe single-field-frame de-interlaced picture and the three-field-frametemporal filter combination of single-field-frame de-interlacedpictures. This selection then results in an image where the pixels arefrom the three-field-frame de-interlacer in those areas where that imagediffers in small amounts (i.e., below the threshold) from the singlefield-frame image, and where the pixels are from the single field-frameimage in those areas where the three-field-frame differed more than thenthe threshold amount from the single-field-frame de-interlaced pixels(after smoothing).

This technique has proven effective in preserving single-field fastmotion details (by switching to the single-field-frame de-interlacedpixels), while smoothing large portions of the image (by switching tothe three-field-frame de-interlaced temporal filter combination).

In addition to selecting between the single-field-frame andthree-field-frame de-interlaced image, it is also often beneficial toadd a bit of the single-field-frame image to the three-field-framede-interlaced picture, to preserve some of the immediacy of the singlefield pictures over the entire image. This immediacy is balanced againstthe temporal smoothness of the three-field-frame filter. A typicalblending is to create new frame by adding 33.33% (⅓) of a single middlefield-frame to 66.67% (⅔) of the corresponding three-field-framesmoothed image. This can be done before or after threshold switching,since the result is the same either way, only affecting the smoothedthree-field-frame picture. Note that this is effectively equivalent tousing a different proportion of the three field-frames, rather than theoriginal three-field frame weights of [0.25, 0.5, 0.25]. Computing ⅔ of[0.25, 0.5, 0.25] plus ⅓ of (0,1,0), yields [0.1667, 0.6666, 0.1667] asthe temporal filter for the three field-frames. The more heavilyweighted center (current) field-frame brings additional immediacy to theresult, even in the smoothed areas which fell below the threshold value.This combination has proven effective in balancing temporal smoothnesswills immediacy in the de-interlacing process for moving parts of ascene.

Use of Linear Filters

Sums, filters, or matrices involving video pictures should take intoaccount the fact that pixel values in video are non-linear signals. Forexample, the video curve for HDTV can be several variations ofcoefficients and factors, but a typical formula is the internationalCCIR XA-11 (now called Rec. 709):V=1.0993*L^(0.45)−0.0993 for L>0.018051V=4.5*L for L<=0.018051

where V is the video value and L is linear light luminance.

The variations adjust the threshold (0.018051) a little, the factor(4.5) a little (e.g. 4.0), and the exponent (0.45) a little (e.g., 0.4).The fundamental formula, however, remains the same.

A matrix operation, such as a RGB to/from YUV conversion, implies linearvalues. The fact that MPEG in general uses the video non-linear valuesas if they were linear results in leakage between the luminance (Y) andthe color values (U, and V). This leakage interferes with compressionefficiency. The use of a logarithmic representation, such as is usedwith film density units, corrects much of this problem. The varioustypes of MPEG encoding are neutral to the non-linear aspects of thesignal, although its efficiency is effected due to the use of the matrixconversion RGB to/from YUV.YUV (U=R−Y, V=B−Y) should have Y computed asa linearized sum of 0.59 G, plus 0.29 R, plus 0.12 B (or slightvariations on these coefficients). However, U (=R−Y) becomes equivalentto R/Y in logarithmic space, which is orthogonal to luminance. Thus, ashaded orange ball will not vary the U (=R−Y) parameter in a logarithmicrepresentation. The brightness variation will be represented completelyin the Luminance parameter, where full detail is provided.

The linear vs. logarithmic vs. video issue impacts filtering. A keypoint to note is that small signal excursions (e.g. 10% or less) areapproximately correct when a non-linear video signal is processed as ifit were a linear signal. This is because a piece-wise linearapproximation to the smooth video-to-from-linear conversion curve isreasonable. However, for large excursions, a linear filter is much moreeffective, and produces much better image quality. Accordingly, if largeexcursions are to be optimally coded, transformed, or otherwiseprocessed, it would be desirable to first convert the non-linear signalto a linear one in order to be able to apply a linear filter.

De-interlacing is therefore much better when each filter and summationstep utilizes conversions to linear values prior to filtering orsumming. This is due to the large signal excursions inherent ininterlaced signals at small details of the image. After filtering, theimage signals are converted back to the non-linear video digitalrepresentation. Thus, the three-field-frame weighting (e.g., [0.25, 0.5,0.25] or [0.1667, 0.6666, 0.1667]) should be performed on a linearizedvideo signal. Other filtering and weighted sums of partial terms innoise and de-interlace filtering should also be converted to linear formfor computation. Which operations warrant linear processing isdetermined by signal excursion, and the type of filtering. Imagesharpening can be appropriately computed in video or logarithmicnon-linear representations, since it is self-proportional. However,matrix processing, spatial filtering, weighted sums, and de-interlaceprocessing should be computed using linearized digital values.

As a simple example, the single field-frame de-interlacer describedabove computes missing alternate lines by averaging the line above andbelow each actual line. This average is much more correct numericallyand visually if this average is done linearly. Thus, instead of summing0.5 times the line above plus 0.5 times the line below, the digitalvalues are linearized first, then averaged, and then reconverted backinto the non-linear video representation.

Median Filters

In noise processing, the most useful filter is the median filter. Athree element median filter just ranks the three entries, via a simplesort, and picks the middle one. For example, an X (horizontal) medianfilter looks at the red value (or green or blue) of three adjacenthorizontal pixels, and picks the one with the middle-most value. If twoare the same, that value is selected. Similarly, a Y (vertical) filterlooks in the scanlines above and below the current pixel, and againpicks the middle value.

It has been experimentally determined that it is useful to average theresults from applying both an X and a Y median filter to create a newnoise-reducing component picture (i.e., each new pixel is the 50% equalaverage of the X and Y medians for the corresponding pixel from a sourceimage).

In addition to X and Y (horizontal and vertical) medians, it is alsopossible to take diagonal and other medians. However, the vertical andhorizontal pixel values are most close physically to any particularpixel, and therefore produce less potential error or distortion than thediagonals. However, such other medians remain available in cases wherenoise reduction is difficult using only the vertical and horizontalmedians.

Another beneficial source of noise reduction is information from theprevious and subsequent frame (i.e., a temporal median). As mentionedbelow, motion analysis provides the best match for moving regions.However, it is compute intensive. If a region of the image is notmoving, or is moving slowly, the red values (and green and blue) from acurrent pixel can be median filtered with the red value at that samepixel location in the previous and subsequent frames. However, oddartifacts may occur if significant motion is present and such a temporalfilter is used. Thus, it is preferred that a threshold be taken first,to determine whether such a median would differ more than a selectedamount from the value of a current pixel. The threshold can be computedessentially the same as for the de-interlacing threshold above:Rdiff=R_current_pixel minus R_temporal_medianGdiff=G_current_pixel minus G_temporal_medianBdiff=B_current_pixel minus B_temporal_medianThresholdingValue=abs(Rdiff+Gdiff+Bdiff)+abs(Rdiff)+abs(Gdiff)+abs(Bdiff)

The ThresholdingValue is then compared to a threshold setting. Typicalthreshold settings are in the range 0.1 to 0.3, with 0.2 being typical.Above the threshold, the current value is kept. Below the threshold, thetemporal median is used. The block diagram of FIG. 3 also applies tothis threshold test. In this case the PROCESSING block 30 is a temporalmedian filter and the inputs are three successive frames. The SELECTIONCONTROL block 32 compares the output 36 of the PROCESSING block 30 withInput B 34 using the above equations for Rdiff, Gdiff, Bdiff, andThresholdingValue. The switch selects the PROCESSING output 36 if theThresholdingValue is less than the threshold, otherwise the switchselects Input B 34, the middle value, for the output 38.

An additional median type is a median taken between the X, Y, andtemporal medians. Another median type can take the temporal median, andthen take the equal average of the X and Y medians from it.

Each type of median can cause problems. X and Y medians smear and bluran image, so that it looks “greasy”. Temporal medians cause smearing ofmotion over time. Since each median can result in problems, yet eachmedian's properties are different (and, in some sense, “orthogonal”), ithas been determined experimentally that the best results come bycombining a variety of medians.

In particular, FIG. 4 shows a preferred combination of medians is alinear weighted sum (see the discussion above on linear videoprocessing) of five terms to determine the value for each pixel of acurrent image:

50% of the original image (Frame N 40) (thus, the most noise reductionis 3 db, or half);

15% of the average of X and Y medians 42, 44, respectively;

10% of the thresholded temporal median 46;

10% of the average of X and Y medians of the thresholded temporal median(48); and

15% of a three-way X, Y, and temporal median (50).

This set of time medians does a reasonable job of reducing the noise inthe image without making it appear “greasy” or blurred, causing temporalsmearing of moving objects, or losing detail. Another useful weightingof these five terms is 35%, 20%, 22.5%, 10%, and 12.5%, respectively.

In addition, it is useful to apply motion-compensation by applyingcenter weighted temporal filters to a motion-compensated n×n region, asdescribed below. This can be added to the median filtered image result(of five terms, just described) to further smooth the image, providingbetter smoothing and detail on moving image regions.

Motion Analysis

In addition to “in-place” temporal filtering, which does a good job atsmoothing slow-moving details, de-interlacing and noise reduction canalso be improved by use of motion analysis. Adding the pixels at thesame location in three fields or three frames is valid for stationaryobjects. However, for moving objects, if temporal averaging/smoothing isdesired, it is often more optimal to attempt to analyze prevailingmotion over a small group of pixels. For example, an n×n block of pixels(e.g., 2×2, 3×3, 4×4, 6×6, or 8×8) can be used to search in previous andsubsequent fields or frames to attempt to find a match (in the same wayMPEG-2 motion vectors are found by matching 16×16 macroblocks). Once abest match is found in one or more previous and subsequent frames, a“trajectory” and “moving mini-picture” can be determined. For interlacedfields, it is best to analyze comparisons as well as compute inferredmoving mini-pictures utilizing the results of the thresholdedde-interlaced process above. Since this process has already separatedthe fast-moving from the slow-moving details, and has already smoothedthe slow moving details, the picture comparisons and reconstructions aremore applicable than individual de-interlaced fields.

The motion analysis preferably is performed by comparison of an n×nblock in the current thresholded de-interlaced image with all nearbyblocks in the previous and subsequent one or more frames. The comparisonmay be the absolute value of differences in luminance or ROB over then×n block. One frame is sufficient forward and backward if the motionvectors are nearly equal and opposite. However, if the motion vectorsare not nearly equal and opposite, than an additional one or two framesforward and backward can help determine the actual trajectory. Further,different de-interlacing treatments may be useful in helping determinethe “best guess” motion vectors going forward and back. Onede-interlacing treatment can be to use only individual de-interlacedfields, although this is heavily prone to aliasing and artifacts onsmall moving details. Another de-interlacing technique is to use onlythe three-field-frame smooth de-interlacing, without thresholding,having weightings [0.25, 0.5, 0.25], as described above. Althoughdetails are smoothed and sometimes lost, the trajectory may often bemore correct.

Once a trajectory is found, a “smoothed n×n block” can be created bytemporally filtering using the motion-vector-offset pixels from the one(or more) previous and subsequent frames. A typical filter might againbe [0.25, 0.5, 0.25] or [0.1667, 0.6666, 0.1667] for three frames, andpossibly [0.1, 0.2, 0.4, 0.2, 0.1] for two frames back and forward.Other filters, with less central weight, are also useful, especiallywith smaller block sizes (such as 2×2, 3×3, and 4×4). Reliability of thematch between frames is indicated by the absolute difference value.Large minimum absolute differences can be used to select more centerweight in the filter. Lower values of absolute differences can suggest agood match, and can be used to select less center weight to more evenlydistribute the average over a span of several frames ofmotion-compensated blocks.

These filter weights can be applied to: individual de-interlacedmotion-compensated field-frames; thresholded three-field-framede-interlaced pictures, described above; and non-thresholdedthree-field-frame de-interlaced images, with a [0.25, 0.5, 0.25]weighting, also as described above. However, the best filter weightsusually come from applying the motion-compensated block linear filteringto the thresholded three-field-frame result described above. This isbecause the thresholded three-field-frame image is both the smoothest(in terms of removing aliasing in smooth areas), as well as the mostmotion-responsive (in terms of defaulting to a single de-interlacedfield-frame above the threshold). Thus, the motion vectors from motionanalysis can be used as the inputs to multi-frame ormulti-de-interlaced-field-frame or single-de-interlaced field-framefilters, or combinations thereof. The thresholded multi-field-framede-interlaced images, however, form the best filter input in most cases.

The use of motion analysis is computationally expensive for a largesearch region, when fast motion might be found (such as ±32 pixels).Accordingly, it may be best to augment the speed by usingspecial-purpose hardware or a digital signal processor assistedcomputer.

Once motion vectors are found, together with their absolute differencemeasure of accuracy, they can be utilized for the complex process ofattempting frame rate conversion. However, occlusion issues (objectsobscuring or revealing others) will confound matches, and cannot beaccurately inferred automatically. Occlusion can also involve temporalaliasing, as can normal image temporal undersampling and its beat withnatural image frequencies (such as the “backward wagon wheel” effect inmovies). These problems often cannot be unraveled by any knowncomputation technique, and to date require human assistance. Thus, humanscrutiny and adjustment, when real-time automatic processing is notrequired, can be used for off-line and non-real-time frame-rateconversion and other similar temporal processes.

De-interlacing is a simple form of the same problem. Just as withframe-rate-conversion, the task of de-interlacing is theoreticallyimpossible to perform perfectly. This is especially due to the temporalundersampling (closed shutter), and an inappropriate temporal samplefilter (i.e., a box filter). However, even with correct samples, issuessuch as occlusion and interlace aliasing further ensure the theoreticalimpossibility of correct results. The cases where this is visible aremitigated by the depth of the tools, as described here, which areapplied to the problem. Pathological cases will always exist in realimage sequences. The goal can only be to reduce the frequency and levelof impairment when these sequences are encountered. However, in manycases, the de-interlacing process can be acceptably fully automated, andcan run unassisted in real-time. Even so, there are many parameterswhich can often benefit from manual adjustment.

Filter Smoothing of High Frequencies

In addition to median filtering, reducing high frequency detail willalso reduce high frequency noise. However, this smoothing comes at theprice of loss of sharpness and detail. Thus, only a small amount of suchsmoothing is generally useful. A filter which creates smoothing can beeasily made, as with the threshold for de-interlacing, by down-filteringwith a normal filter (e.g., truncated sinc filter) and then up-filteringwith a gaussian filter. The result will be smoothed because it is devoidof high frequency picture detail. When such a term is added, ittypically must be in very small amounts, such as 5% to 10%, in order toprovide a small amount of noise reduction. In larger amounts, theblurring effect generally becomes quite visible.

Base Layer Noise Filtering

The filter parameters for the median filtering described above for anoriginal image should be matched to the noise characteristics of thefilm grain or image sensor that captured the image. After this medianfiltered image is down-filtered to generate an input to the base layercompression process, it still contains a small amount of noise. Thisnoise may be further reduced by a combination of another X-Y medianfilters (equally averaging the X and Y medians), plus a very smallamount of the high frequency smoothing filter. A preferred filterweighting of these three terms, applied to each pixel of the base layer,is:

75% of the original base layer (down filtered from median-filteredoriginal above);

22.5% of the average of X and Y medians; and

7.5% of the down-up smoothing filter.

This small amount of additional filtering in the base layer provides asmall additional amount of noise reduction and improved stability,resulting in better MPEG encoding and limiting the amount of noise addedby such encoding.

Image Filtering

Downsizing and Upsizing Filters

Experimentation has shown that the downsizing filter used in creating abase layer from a high resolution original picture is most optimal if itincludes modest negative lobes and an extent which stops after the firstvery small positive lobes after the negative lobes. FIG. 5 is a diagramof the relative shape, amplitudes, and lobe polarity of a preferreddownsizing filter. The down filter essentially is a center-weightedfunction which has been truncated to a center positive lobe 500, asymmetric pair of adjacent (bracketing) small negative lobes 504, and asymmetric pair of adjacent (bracketing) very small outer positive lobes504. The absolute amplitude of the lobes 500, 502, 504 may be adjustedas desired, so long as the relative polarity and amplitude inequalityrelationships shown in FIG. 5 are maintained. However, a good firstapproximation for the relative amplitudes are defined by a truncatedsinc function (sinc(x)=sin(x)/x)). Such filters can be used separably,which means that the horizontal data dimension is independently filteredand resized, and then the vertical data dimension, or vise versa; theresult is the same.

When creating a base layer original (as input to the base layercompression) from a low-noise high resolution original input, thepreferred downsizing filter has first negative lobes which are of anormal sinc function amplitude. For clean and for high resolution inputimages, this normal truncated sinc function works well. For lowerresolutions (e.g., 1280×720, 1024×768, or 1536×768), and for noisierinput pictures, a reduced first negative lobe amplitude in the filtersis more optimal. A suitable amplitude in such cases is about half thetruncated sinc function negative lobe amplitude. The small firstpositive lobes outside of the first negative lobes are also reduced tolower amplitude, typically to ½ to ⅔ of the normal sinc functionamplitude. The affect of reducing the first negative lobes is the mainissue, since the small outside positive lobes do not contribute topicture noise. Further samples outside the first positive lobespreferably are truncated to minimize ringing and other potentialartifacts.

The choice of whether to use milder negative lobes or full sinc functionamplitude negative lobes in the downfilter is determined by theresolution and noise level of the original image. It is also somewhat afunction of image content, since some types of scenes are easier to codethan others (mainly related to the amount of motion and change in aparticular shot). By using a “milder” downfilter having reduced negativelobes, noise in the base layer is reduced, and a cleaner and quietercompression of the base layer is achieved, thus also resulting in fewerartifacts.

Experimentation has also shown that the optimal upsizing filter has acenter positive lobe with small adjacent negative lobes, but no furtherpositive lobes. FIGS. 6A and 6B are diagrams of the relative shape,amplitudes, and lobe polarity of a pair of preferred upsizing filtersfor upsizing by a factor of 2. A central positive lobe 600, 600′ isbracketed by a pair of small negative lobes 602, 602′. An asymmetricallyplaced positive lobe 604, 604′ is also required. These paired upfilterscould also be considered to be truncated sinc filters centered on thenewly created samples. For example, for a factor of two upfilter, twonew samples will be created for each original sample. The small adjacentnegative lobes 602, 602′ have less negative amplitude than is used inthe corresponding downsizing filter (FIG. 5), or than would be used inan optimal (sinc-based) upsizing filter for normal images. This isbecause the images being upsized are decompressed, and the compressionprocess changes the spectral distribution. Thus, more modest negativelobes, and no additional positive lobes beyond the middle ones 600,600′, work better for upsizing a decompressed base layer.

Experimentation has shown that slight negative lobes 602, 602′ provide abetter layered result than positive-only gaussian or spline upfilters(note that splines can have negative lobes, but are most often used inthe positive-only form). Thus, this upsizing filter preferably is usedfor the base layer in both the encoder and the decoder.

Weighting of High Octave of Picture Detail

In the preferred embodiment, the signal path which expands the originaluncompressed base layer input image uses a gaussian upfilter rather thanthe upfilter described above. In particular, a gaussian upfilter is usedfor the “high octave” of picture detail, which is determined bysubtracting the expanded original base-resolution input image (withoutusing compression) from the original picture. Thus, no negative lobesare used for this particular upfiltered expansion.

As noted above, for MPEG-2 this high octave difference signal path istypically weighted with 0.25 (or 25%) and added to the expandeddecompressed base layer (using the other upfilter described above) asinput to the enhancement layer compression process. However,experimentation has shown that weights of 10%, 15%, 20%, 30%, and 35%are useful for particular images when using MPEG-2. Other weights mayalso prove useful. For MPEG-4, it has been found that filter weights of4-8% may be optimal when used in conjunction with other improvementsdescribed below. Accordingly, this weighting should be regarded as anadjustable parameter, depending upon the encoding system, the scenesbeing encoded/compressed, the particular camera (or film) being used,and the image resolution.

Filters with Negative Lobes For Motion Compensation in MPEG-2 and MPEG-4

In MPEG-4, reference filters have been implemented for shiftingmacroblocks when finding the best motion vector match, and then usingthe matched region for motion compensation. MPEG-4 video coding, likeMPEG-2, supports ½ pixel resolution of motion vectors for macroblocks.Unlike MPEG-2, MPEG-4 also supports ¼ pixel accuracy. However, in thereference implementation of MPEG-4, the filters used are sub-optimal. InMPEG-2, the half-way point between pixels is just the average of the twoneighbors, which is a sub-optimal box filter. In MPEG-4, this filter isused for ½ pixel resolution. If ¼ pixel resolution is invoked in MPEG-4Part 2, a filter with negative lobes is used for the half-way point, buta sub-optimal box filter with this result and the neighboring pixels isused for the ¼ and ¾ points.

Further, the chrominance channels (U=R−Y and V=B−Y) do not use anysub-pixel resolution in the motion compensation step under MPEG-4. Sincethe luminance channel (Y) has resolution to the ½ or ¼ pixel, thehalf-resolution chrominance U and V channels should be sampled usingfilters to ¼ pixel resolution, corresponding to ½ pixel in luminance.When ¼ pixel resolution is selected for luminance, then ⅛ pixelresolution should be used for U and V chrominance.

Experiments have shown that the effects of filtering are significantlyimproved by using a negative lobe truncated sinc function (as describedabove) for filtering the ¼, ½, and ¾ pixel points when doing ¼ pixelresolution in luminance, and by using similar negative lobes when doing½ pixel resolution for the filter which creates the ½ pixel position.

Similarly, effects of filtering are significantly improved by using anegative lobe truncated sinc function for filtering the ⅛-pixel pointsfor U and V chrominance when using ¼ pixel luminance resolution, and byusing ¼ pixel resolution filters with similar negative lobe filters whenusing ½ pixel luminance resolution.

It has been discovered that the combination of quarter-pixel motionvectors with truncated sinc motion compensated displacement filteringresults in a major improvement in picture quality. In particular,clarity is improved, noise and artifacts are reduced, and chroma detailis increased.

These filters may be applied to video images under MPEG-1, MPEG-2,MPEG-4 or any other appropriate motion-compensated block-based imagecoding system.

COMPUTER IMPLEMENTATION

The invention may be implemented in hardware or software, or acombination of both. However, preferably, the invention is implementedin computer programs executing on one or more programmable computerseach comprising at least a processor, a data storage system (includingvolatile and non-volatile memory and/or storage elements), an inputdevice, and an output device. Program code is applied to input data toperform the functions described herein and generate output information.The output information is applied to one or more output devices, inknown fashion.

Each such program may be implemented in any desired computer language(including machine, assembly, or high level procedural, logical, orobject oriented programming languages) to communicate with a computersystem. In any case, the language may be a compiled or interpretedlanguage.

Each such computer program is preferably stored on a storage media ordevice (e.g., ROM, CD-ROM, or magnetic or optical media) readable by ageneral or special purpose programmable computer system, for configuringand operating the computer when the storage media or device is read bythe computer system to perform the procedures described herein. Theinventive system may also be considered to be implemented as acomputer-readable storage medium, configured with a computer program,where the storage medium so configured causes a computer system tooperate in a specific and predefined manner to perform the functionsdescribed herein.

A number of embodiments of the invention have been described.Nevertheless, it will be understood that various modifications may bemade without departing from the spirit and scope of the invention. Forexample, while the preferred embodiment uses MPEG-2 or MPEG-4 coding anddecoding, the invention will work with any comparable standard thatprovides equivalents of I, P, and/or B frames and layers. Accordingly,it is to be understood that the invention is not to be limited by thespecific illustrated embodiment, but only by the scope of the appendedclaims.

1. A method for enhancing image quality in an image encoding system,including: applying a temporal median filter to corresponding pixelvalues of a previous digital video image, a current digital video image,and a next digital video image to create a noise-reduced digital videoimage; comparing the difference between each corresponding pixel valueof each noise-reduced digital video image and each corresponding currentdigital video image to a threshold value to generate a difference value;and selecting, for each final pixel value for the noise-reduced digitalvideo image, a corresponding pixel value from the current digital videoimage if the difference value is within a first threshold comparisonrange, and a corresponding pixel value from the noise-reduced digitalvideo image if the difference value is within a second thresholdcomparison range.
 2. A method for enhancing image quality in an imageencoding system, including: applying a temporal median filter tocorresponding pixel values of a previous digital video image, a currentdigital video image, and a next digital video image to create anoise-reduced digital video image; comparing the difference between eachcorresponding pixel value of each noise-reduced digital video image andeach corresponding current digital video image to a threshold value togenerate a difference value; and selecting, for each final pixel valuefor the noise-reduced digital video image, a corresponding pixel valuefrom the current digital video image if the difference value is within afirst threshold comparison range, and a corresponding pixel value fromthe noise-reduced digital video image if the difference value is withina second threshold comparison range, wherein the threshold value isselected from the range of approximately 0.1 to approximately 0.3.
 3. Amethod for enhancing image quality in an image encoding system,including creating a noise-reduced digital video image comprising alinear weighted sum of five terms: a current digital video image; anaverage of horizontal and vertical medians of the current digital videoimage; a thresholded temporal median; an average of horizontal andvertical medians of the thresholded temporal median; and a median of thethresholded temporal median and horizontal and vertical medians of thecurrent digital video image, wherein the weights of the five terms areapproximately 50%, 15%, 10%, 10%, and 15%, respectively.
 4. A method forenhancing image quality in an image encoding system, including creatinga noise-reduced digital video image comprising a linear weighted sum offive terms: a current digital video image; an average of horizontal andvertical medians of the current digital video image; a thresholdedtemporal median; an average of horizontal and vertical medians of thethresholded temporal median; and a median of the thresholded temporalmedian and horizontal and vertical medians of the current digital videoimage, wherein the weights of the five terms are approximately 35%, 20%,22.5%, 10%, and 12.5%, respectively.
 5. A method for enhancing imagequality in an image encoding system, including: creating a noise-reduceddigital video image comprising a linear weighted sum of five terms: acurrent digital video image; an average of horizontal and verticalmedians of the current digital video image; a thresholded temporalmedian; an average of horizontal and vertical medians of the thresholdedtemporal median; and a median of the thresholded temporal median andhorizontal and vertical medians of the current digital video image;determining a motion vector for each n×n pixel region of the currentdigital video image with respect to at least one previous digital videoimage and at least one subsequent digital video image; applying a centerweighted temporal filter to each n×n pixel region of the current digitalvideo image and corresponding motion-vector offset n×n pixel regions ofthe at least one previous digital video image and at least onesubsequent digital video image to create a motion-compensated image; andadding the motion-compensated image to the noise-reduced digital videoimage.
 6. A method for enhancing image quality in an image encodingsystem, including: determining a motion vector for each n×n pixel regionof a current digital video image with respect to at least one previousdigital video image and at least one subsequent digital video image; andapplying a center weighted temporal filter to each n×n pixel region ofthe current digital video image and corresponding motion-vector offsetn×n pixel regions of the at least one previous digital video image andat least one subsequent digital video image to create amotion-compensated image, wherein each digital video image is athree-field-frame de-interlaced image.
 7. A method for enhancing imagequality in an image encoding system, including: determining a motionvector for each n×n pixel region of a current digital video image withrespect to at least one previous digital video image and at least onesubsequent digital video image; and applying a center weighted temporalfilter to each n×n pixel region of the current digital video image andcorresponding motion-vector offset n×n pixel regions of the at least oneprevious digital video image and at least one subsequent digital videoimage to create a motion-compensated image, wherein each digital videoimage is a thresholded three-field-frame de-interlaced image.
 8. Amethod for enhancing image quality in an image encoding system,including: determining a motion vector for each n×n pixel region of acurrent digital video image with respect to at least one previousdigital video image and at least one subsequent digital video image; andapplying a center weighted temporal filter to each n×n pixel region ofthe current digital video image and corresponding motion-vector offsetn×n pixel regions of the at least one previous digital video image andat least one subsequent digital video image to create amotion-compensated image, wherein the center weighted temporal filter isa three-image temporal filter having weights for each of such images ofapproximately 25%, 50%, and 25%, respectively.
 9. A method for enhancingimage quality in an image encoding system, including: determining amotion vector for each n×n pixel region of a current digital video imagewith respect to at least one previous digital video image and at leastone subsequent digital video image; and applying a center weightedtemporal filter to each n×n pixel region of the current digital videoimage and corresponding motion-vector offset n×n pixel regions of the atleast one previous digital video image and at least one subsequentdigital video image to create a motion-compensated image, wherein thecenter weighted temporal filter is a five-image temporal filter havingweights for each of such images of approximately 10%, 20%, 40%, 20%, and10%, respectively.
 10. A method for enhancing image quality in an imageencoding system, including: applying a normal down filter to an image tocreate a first intermediate image; applying a Gaussian up filter to thefirst intermediate image to create a second intermediate image; andadding a weighted fraction of the second intermediate image to aselected image to create an image having reduced high frequency noise.11. The method of claim 10, wherein the weighted fraction is betweenapproximately 5% and 10% of the second intermediate image.
 12. A methodfor enhancing image quality in an image system, the method comprising:applying a filter to horizontal pixel values of a digital video image togenerate a one-half filtered pixel value for a one-half pixeldisplacement for motion compensation, the filter including a centerpositive lobe and a pair of negative lobes bracketing the centerpositive lobe, wherein an absolute amplitude of each of the negativelobes is less than an absolute amplitude of the center positive lobe.13. The method of claim 12, wherein the pair of negative lobes aresymmetric.
 14. The method of claim 12, further comprising: applying thefilter to vertical pixel values of a digital video image to generate asecond one-half filtered pixel value.
 15. A device for an image systemhaving a one-half pixel displacement, the device comprising: one or moreprocessing devices configured to implement: a filter configured togenerate a one-half filtered pixel value for motion compensation fromhorizontal pixel values, the filter including a center positive lobe anda pair of negative lobes bracketing the center positive lobe, wherein anabsolute amplitude of each of the negative lobes is less than anabsolute amplitude of the center positive lobe.
 16. The device of claim15, wherein the pair of negative lobes are symmetric.
 17. The device ofclaim 15, wherein the filter is further configured to generate a secondone-half filtered pixel value from vertical pixel values.
 18. The deviceof claim 15, wherein the device is configured to decode a digital videoimage.